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ABSTRACT 

The classical back propagation (CBP) method is the simplest algorithm for training feed-forward neural networks 
(FFNNs). It uses the steepest descent search direction with fixed learning rate a to minimize the error function E, since a is 
fixed at each iteration this cause slow convergence. In this paper I will suggest a new formula for computing learning rate a 
by using tangent hyperbolic function with Wolfe conditions to accelerate the convergence of the CBP algorithm. 
Simulation results are presented and compared with matlab toolbox training algorithms. 
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1. INTRODUCTION 

The area of artificial neural networks has been extensively studied and has been widely used in many applications 
of artificial intelligence. Due to their excellent capability of self-learning and self-adapting, they attracted much interest 
from the scientific community. Recently, they have been established as vital components of many systems and are 
considered as a powerful tool for solving different type of problems [5], [19]. There exist many types of neural networks e. 
g see[7,12,17], but the basic principles are similar. Each neuron in the network is able to receive input signals, to process 
them and to send an output. Each is connected at least with one neuron, and each connection is evaluated by a real number, 
called weight coefficient that reflects the degree of importance of the given connection in the neural networks. 

Multi-layer feed-forward neural networks (MLFFs) have been the preferred neural network architecture for the 
solution of classification, function approximation and other types of problems, due to their outstanding learning and 
generalization abilities. MLFFs allow signals to travel one way only, from input layer to the output layer. There is no 
feedback (loops) i.e the output of any layer does not affect that same layer. MLFFs tend to be straightforward networks that 
associated inputs with outputs [7, 3, 19] 

There exist two main types of training process supervised and unsupervised training. Supervised training means 
that neural network, has known desired output (target) and adjusting of weight coefficients is done in such a way that the 
calculated and the desired outputs are as close as possible [6]. This paper is concerned with supervised learning, for 
unsupervised learning see [9]. 

The remainder of this paper is organized as follows. Section 2 describes a brief summary of supervised training 
process with classical back propagation algorithm. Section 3 gives the suggested method and it's global convergence 
analysis. Section 4 contains the experimental results and finely section 5 presents the concluding remarks. 
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2. SUPERVISED TRAINING PROCESS AND CLASSICAL BACKPROPAGATION ALGORITHM 


The training of multi-layer feed-forward consists of neurons that are ordered into layers. The first layer is called 
the input layer the last layer is called output layer and layers between them are hidden layers can be formulated as the 
minimization of an error function E(w) see equation(l) below that depends on the connection weights w of the network 
and defines as the sum of the squared differences between the computed and target values 


/ p m 

^ j=l i=l 


( 1 ) 


O T 

The variables 1 and ‘ stand actual and desired response of ith output neurons, respectively. The superscript j 
denote the particular learning pattern p. The vector W is composed of all weights in the net, summation of the actual errors 
takes place over all M output neurons and all P learning data (X, T) where the N-dimensional vector X is the input vector 
and the M-dimensional vector T is the target vector associated with X. The most popular training algorithm for MLFF 
neural network is the classical Back propagation [18] which may be proceed in one of two basic ways. Pattern mode 
(on-line) or batch mode. In pattern mode of CBP learning, weight updating is performed after the presentation of each 
training pattern. In the batch mode of CBP learning weight updating is performed after the presentation of all the training 
examples (i.e. after the whole epoch) see [4, 12]. This paper consider the batch mode training. The high level description of 
the classical back propagation training algorithm for MLFFs is 

CBP Algorithm 

Step (1): initiate k = l,w k a = 0.01, tolerance E rr and c— >0 

Step (2): If E(w k ) < E n or||V£'(w)|| < £ terminate 

Step (3): compute descent search direction 

d k =~8k V k 


Step (4): update the weights 

w u + \ = w k +a d k 


Step (5): set k=k+l and go to Step (2). 

Although the CBP algorithm is a simple learning algorithm for training, since only two major parameters are used 

a 

to control the training process. The first is the learning rate which is constant value, second is the search direction 

d, =—g, V k 

, unfortunately it is not based on a sound theoretical basis and it has some disadvantages, first it's 
convergence is fast only if the parameter setting is nearly optimal, furthermore the convergence rate is slow and decreases 
rapidly as the problem size increase, second it's convergence is guaranteed only if the learning rate is small enough[2,10]. 
The mean problem then is to determined a priority what small enough means. In other words for shallow minimum, the 
learning rate is often too small where as for narrow minimum it is often too large and process never convergence, there for 
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the CBP tends to be inefficient. From the above discussion we see that the learning rate is crucial for the convergence as 
well as to speed up of the algorithm. Many approaches have been proposed to compute learning rate, see [1, 11]. 

3 SUGGESTED METHOD AND It's CONVERGENCE ANALYSIS 

3.1 Suggested Method 


d, d, =— e, 

After k is fixed ( k k 


V k a 

) the learning rate ideally would solve the following one dimensional 


optimization problem 


Min E(w k + a d k ) 

a> 0 


( 2 ) 


This optimization problem is usually impossible to solve exactly, instead & j s computed (via an iterative 
procedure referred to as line search) either to approximately solve the above optimization problem or to ensure sufficient 
decrease in the value of E. In order to avoid solving equation (2) we need to find ^ such that the following Wolfe 
conditions hold 


E (w k +a k d k )-E (w k ) < c x a k g T k d k 
g(w k +a k d k ) T d k >c 2 g T k d k 


Where 0<C 1 <C 2 <1. 


oc oc 

To find such k we shall develop a new procedure which contains two stages, the first stage computes, k from 


the Tangent hyperbolic equation 


a. 


\ — e~ 


2 *(- 


1 


k+1 l + e~ at 1 + e~ 

cc k+ \ —2* S(cx k ) — 1 


-)-l 


(5) 


Where $(&) j s s ig m oid transfer function used in the network. Second stage use the backtracking to satisfy 
the Wolfe conditions. 

At this point, we present a high level description of proposed algorithm (ABBP say) for neural network training. 

Algorithm (ABBP) 


Step (1): initiate k=l, W|g ^ ^ ( ! ^ c 2 — ; tolerance E IT and e— >0 

Step (2): If E(w k )< En- or ||V£'(w)|| < £ terminate 
Step (3): compute descent search direction 


d k =~8k 


V k 
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Step (4): Calculate the learning rate 


a k 


using 


a k 



k = 1 


> 


2*5 ( a ) - 1 otherwise 


cc 

Step (5): Use backtracking line search to satisfying Wolfe conditions (3) and (4)to accept k 
Step (6): Update the weights 

w u + \ = w k d k 


Step (7): Set k=k+l and go to Step (2). 

3.2 Convergence Analysis 

In order to establish the global convergence result, the following definition [16] is essential: the angle 0 k between 
the search direction d k and steepest descent direction -g k , defined by 


COS 0 k = 


Ik- IK I 


(6) 


Also the Zoutendijk theorem [16] necessary to establishing the convergence of any line search methods 

Zoutendijk theorem [16] 


W,. 


= w, + a,d. 


d ls k <0. 


a. 


Consider any iteration of the form ’ /;+l k k k , where k is descent direction i.e k 6 k and k 
satisfies the Wolfe conditions(3) and (4). Suppose that E is bounded below and continuously differentiable in an open 

setNcontaining the level set ^ { U . E( u ) < E( U’| ) } w ^ crc wistarting weight vector of the iteration. The gradient 

vector g k is Lipschitz continuous onN that is there exist L>0 suc [-| that 


g(x)-g(y) ^ Lx-y 


V i, ye K 


Then 


X C 0 S 2 ^lkA|<°o 

^=1 


(7) 


Now we are ready to prove the convergence of the proposed (ABBP) training algorithm as stated in the following 
theorem (1): 


Theorem (1) 
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Suppose that the sequence ^ W ^ k=1 generated by the ABBP algorithm, and assume that the assumptions of the 
Zoutendijk theorem are valid then 

Limlskf =o 

k — »°o 


Proof 


From second Wolfe condition 

(S k+ 1 ~g k ) T d k >(c 2 -1 )g T k d k 

While the Lipschitz condition implies that 

(■ 8k + i-8kfd k <La k \\d k | 

By combining (8) and (9) 


a k > ^'i n? g[d k 
LdA 


Substitute the inequality (10) in the first Wolfe condition (7) 
E <E - c ^ l ~ c ^JS T k d k ) 2 

Ej k + 1 ~ L 1 T ,, H 2 


( 8 ) 


(9) 


(10) 


-> E k + x^ E k- Sc °s 2 O k \gk\ 


8 = c 1 


(c 2 -l) 


Where ^ . By summing the above expression over all the indices less than or equal to k we obtain 


7=1 


(ID 


j7 E - E 

Since tL is bounded below we have 1 k+l some positive constant, for all k. Hence by taking limits for (11) 


— > 


Since 


Icos 2 ^ 


< oo 


k = 1 


( 12 ) 


d, = —gi , cos 0, =1 . ril , .... 

k 6k then k , it follows from (12) 

Limhkf =0 
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4. EXPERIMENTAL RESULTS 

In the following section we shall present experimental results in order to evaluate the performance of the proposed 
(ABBP) algorithm in three famous test problems. The problems have which been tested are the continuous function 
approximation problem, symmetry problem and spect heart problem. On each problem, four algorithms have been 
simulated. These algorithms are the standard back propagation (CBP), the momentum backpropa-gation (MBP) [8], the 
adaptive back propagation (ABP) [20] and the (ABBP). For the simulations an HP PC compatible with Matlab Version 6.3 
has been used. We have utilized Matlab Neural Network. 

Toolbox Version 3.0 for the CBP, MBP and ABP algorithms. For the heuristic parameters of the previous three 
algorithms. Toolbox default values are used, unless stated otherwise. 

The ABBP algorithm has been implemented in Matlab environment and the values of parameters c b c 2 and e have 
been fixed to 10’ 4 ,, 0.5 and 10" 5 , respectively. At the start of each simulation, the weights of the network have been 
initialized by the Nguyen - Widrow method [15]. Each algorithm has been tested for the same initial weights. During 
training of the network, each time step is called an epoch and is defined to be a single sweep through all training patterns. 
At the end of each epoch, the weights of the network have been updated. 

The reported parameters are min the minimum number of epochs, mean the mean value of epochs. Max the 
maximum number of epochs, Tav the average of total time and Succ, the succeeded simulations out of (50) trails within 
error function evaluations limit. 

If an algorithm fails to converge within the above limit considered that it fails to train the FFNN, but its epoch are 
not included in the statically analysis of the algorithm, one gradient and one error function evaluations are necessary at 
each epoch. 

1. The Continuous Function Approximation Problem 

The first problem we have been considered is the approximation of the continuous function F(x)= (3x-l)/(5x+2)on 
the closed interval [0,2]. This problem maps one real input to a single real output. The input values are 20 equally spaced 
points x,E [0, 2] and the target values are the mapping of these points from function F (x). As it is cleared, we have 20 
patterns and each pattern is consisted of one input x £ [0, 2] and one target value F (x). The selected architecture of the 
FNN is 1-10-1 with logistic neurons with biases in the hidden layer and with a linear output neuron with bias. The error 
goal has been set to 0.001 and the maximum epochs to 2000. The results of the simulations are presented in Table 1. 


Table 1: The Cont. Function Approx. Problem 


Algorithms 

Min 

Max 

Mean 

Tav 

Succ 

CBP 

fail 




0% 

ABP 

84 

266 

166.58 

2.702 

100% 

MBP 

692 

1976 

1439.7 

117.50 

43.33% 

ABBP 

68 

127 

59.43 

2.041 

100% 


2. Symmetry Problem 

Consider a 4-input pattern and 1 -output problem [13] where the output is required to be 'one' if the input pattern 
configuration is symmetrical and 'zero' otherwise The selected architecture of the FFNN is 16-6-1 with logistic neurons 
with biases in the hidden output layer. The error goal has been set to 0.01 and the maximum epochs 2000. 
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Table 2: Results of Simulations for the Symmetry Problem 


Algorithms 

Min 

Max 

Mean 

Tav 

Succ 

CBP 

fail 





ABP 

117 

653 

224.97 

1.863 

100% 

MBP 

296 

1502 

1160.2 

6.479 

76.1% 

ABBP 

101 

182 

89.667 

1.305 

100% 


Problem (1): (SPECT Heart Problem) 

This data set contains data instances derived from Cardiac Single Proton Emission Computed Tomography 
(SPECT) images from the University of Colorado [14]. The network architectures for this medical classification problem 
consists of one hidden layer with 6 neurons and an output layer of one neuron. The termination criterion is set to E rr < 0. 1 
within limit of 2000 epochs, table 3 summarizes the result of all algorithms. 

Table 3: Results of Simulations for the Heart Problem 


Algorithms 

Min 

Max 

Mean 

Tav/s 

Succ 

CBP 

— 


— 

— 

0% 

ABP 

98 

497 

321.8 

0.927 

98.65% 

MBP 

114 

444 

262.12 

1.63 

67.34% 

ABBP 

81 

300 

274.3 

0.88 

100% 


5. CONCLUSIONS 

In this work we evaluate the performance of for algorithms for training FFNN's which are uses different methods 
for computing the learning rate. From the rigorous analysis of the simulation results is strongly demonstrated that the 
computing learning rate in the proposed method with Wolfe conditions, proving more stable, efficient and reliable learning. 
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